Output files written to ../out.

Local functions

Load general data

Scores

Clinical

cTF and cTAF

Figures

Figure 1: Consort diagram

No code.

Figure 2: ROC curves

TPR and FPR tables determine sensitivity and specificity using the score cutoffs as desribed in the manuscript. The sensitivity and specificity from the ROC curves generated by the pROC package are for visualization purposes only.

Training set

Training
classifier auc pauc n_cases n_controls pretty_label
Allelic imbalance 0.64 0.61 833 560 Allelic imbalance; AUC = 0.64, pAUC [Sp: 1-0.98] = 0.61
Clinical data 0.56 0.5 815 551 Clinical data; AUC = 0.56, pAUC [Sp: 1-0.98] = 0.5
Fragment endpoints 0.64 0.59 833 560 Fragment endpoints; AUC = 0.64, pAUC [Sp: 1-0.98] = 0.59
Fragment lengths 0.68 0.63 833 560 Fragment lengths; AUC = 0.68, pAUC [Sp: 1-0.98] = 0.63
SCNA 0.69 0.64 833 560 SCNA; AUC = 0.69, pAUC [Sp: 1-0.98] = 0.64
SCNA-WBC 0.7 0.64 833 560 SCNA-WBC; AUC = 0.7, pAUC [Sp: 1-0.98] = 0.64
SNV 0.63 0.58 833 560 SNV; AUC = 0.63, pAUC [Sp: 1-0.98] = 0.58
SNV-WBC 0.65 0.66 833 560 SNV-WBC; AUC = 0.65, pAUC [Sp: 1-0.98] = 0.66
WG methylation 0.73 0.67 833 560 WG methylation; AUC = 0.73, pAUC [Sp: 1-0.98] = 0.67

Validation set

Validation
classifier auc pauc n_cases n_controls pretty_label
Allelic imbalance 0.58 0.59 464 362 Allelic imbalance; AUC = 0.58, pAUC [Sp: 1-0.98] = 0.59
Clinical data 0.55 0.5 457 358 Clinical data; AUC = 0.55, pAUC [Sp: 1-0.98] = 0.5
Fragment endpoints 0.65 0.56 464 362 Fragment endpoints; AUC = 0.65, pAUC [Sp: 1-0.98] = 0.56
Fragment lengths 0.69 0.62 464 362 Fragment lengths; AUC = 0.69, pAUC [Sp: 1-0.98] = 0.62
Pan-feature 0.71 0.66 464 362 Pan-feature; AUC = 0.71, pAUC [Sp: 1-0.98] = 0.66
SCNA 0.68 0.6 464 362 SCNA; AUC = 0.68, pAUC [Sp: 1-0.98] = 0.6
SCNA-WBC 0.7 0.64 464 362 SCNA-WBC; AUC = 0.7, pAUC [Sp: 1-0.98] = 0.64
SNV 0.62 0.56 464 362 SNV; AUC = 0.62, pAUC [Sp: 1-0.98] = 0.56
SNV-WBC 0.68 0.64 464 362 SNV-WBC; AUC = 0.68, pAUC [Sp: 1-0.98] = 0.64
WG methylation 0.71 0.63 464 362 WG methylation; AUC = 0.71, pAUC [Sp: 1-0.98] = 0.63

Figure 3: Limit of detection (validation only)

Figure 4: CSO confusion matrices for joint detected

WG Methylation

SCNA

SNV-WBC

Figure 5: cTAF vs stage by cancer type (top 3 and ‘Remaining’ cancer types)

Supplementary Figures

Supplementary Figure 1: Detection by stage

Supplementary Figure 2: Limit of detection (Training and Validation)

Supplementary Figure 3: Upset plots of detection across 3 assay

True Positives, train

NULL png 2

True Positives, validation

NULL png 2

False Positives, train

NULL png 2

False Positives, validation

NULL png 2

Supplementary Figure 4: UMAP visualization of scores used in pan-feature classifier

Supplementary Figure 5: CSO confusion matrices, independent detection

WG methylation

SCNA

SNV-WBC

Supplementary Figure 6: cTAF by stage and cancer type (all cancer types)

Supplementary Figure 7: WG Methylation score vs cTAF fit

Supplementary Figure 8: Clonal hematopoesis

Panel A: WBC vs cfDNA variant MAFs

Panel B: Mutation recurrence across participants

92.9% (11705 of 12601) of nonsynonymous SNVs in cfDNA that were matching in corresponding participant WBC were private to an individual participant.

Tables

Table 1: Demographic Table

CCGA1 demographic table
my_var train_Invasive Cancer train_Non-cancer valid_Invasive Cancer valid_Non-cancer
Total (%) 854 (100%) 560 (100%) 485 (100%) 362 (100%)
Female 594 (70%) 436 (78%) 307 (63%) 235 (65%)
age (sd) 61 (12) 60 (12) 62 (12) 59 (14)
age 50+ 710 (83%) 452 (81%) 414 (85%) 274 (76%)
Race/Ethnicity
Black or African American 54 (6%) 46 (8%) 33 (7%) 25 (7%)
Hispanic 43 (5%) 29 (5%) 31 (6%) 22 (6%)
Other/unknown 20 (2%) 12 (2%) 21 (4%) 7 (2%)
White, non-Hispanic 737 (86%) 473 (84%) 400 (82%) 308 (85%)
Smoking status, n (%)
Ever-smoker/Missing 445 (52%) 238 (42%) 242 (50%) 179 (49%)
Never-smoker 409 (48%) 322 (57%) 243 (50%) 183 (51%)
Body Mass Index, n (%)
Missing 1 (0%) 1 (0%)
Normal/Underweight 237 (28%) 147 (26%) 139 (29%) 84 (23%)
Obesity 341 (40%) 233 (42%) 186 (38%) 154 (43%)
Overweight 275 (32%) 180 (32%) 160 (33%) 123 (34%)
Site Region, n (%)
Midwest 150 (18%) 83 (15%) 127 (26%) 64 (18%)
Northeast 46 (5%) 53 (9%) 26 (5%) 25 (7%)
South 491 (57%) 346 (62%) 228 (47%) 184 (51%)
West 167 (20%) 78 (14%) 104 (21%) 89 (25%)
Cancer Stage, n (%)
I 289 (34%) 163 (34%)
II 239 (28%) 141 (29%)
III 159 (19%) 75 (15%)
IV 157 (18%) 93 (19%)
Non-informative 10 (1%) 13 (3%)
Dx Method, n (%)
Clinical presentation 561 (66%) 317 (65%)
Screening 293 (34%) 167 (34%)

Table 2: Mapping of assays, features and classifiers

No code

Table 3: Performance metrics at 98% Specificity

Sensitivity at 98% specificity (post-hoc) for train/valid for each classifier
classifier_name Training set Validation set
Clinical data 2.7% (22/815) [1.7%-4.1%] 2.6% (12/457) [1.4%-4.5%]
SNV 19% (159/833) [16%-22%] 16% (75/464) [13%-20%]
Fragment endpoints 22% (181/833) [19%-25%] 18% (84/464) [15%-22%]
Allelic imbalance 25% (210/833) [22%-28%] 22% (101/464) [18%-26%]
SCNA 33% (271/833) [29%-36%] 27% (125/464) [23%-31%]
Fragment lengths 28% (236/833) [25%-32%] 29% (136/464) [25%-34%]
SCNA-WBC 33% (278/833) [30%-37%] 30% (139/464) [26%-34%]
SNV-WBC 36% (299/833) [33%-39%] 33% (155/464) [29%-38%]
WG methylation 39% (328/833) [36%-43%] 34% (158/464) [30%-39%]
Pan-feature NA 36% (165/464) [31%-40%]
Observed false positive rate (post-hoc) for train/valid of the classifiers
classifier_name Training set Validation set
Clinical data 2% (11/551) [1%-3.5%] 2.2% (8/358) [0.97%-4.4%]
Other classifiers 2.1% (12/560) [1.1%-3.7%] 2.2% (8/362) [0.96%-4.3%]

Supplementary Tables

Supplementary Table 1: cTAF logistic regression with AIC

Compare regression models w/ AIC included
Predictor Cancer type Cancer type + stage Cancer type + cTAF Cancer type + cTAF + stage
(Intercept) 0.3 3.9e-08 *** 4.1e-18 *** 5.4e-12 ***
Breast 0.019 * 0.9 0.53 0.32
Lung 0.038 * 0.03 * 0.62 0.48
Colon/Rectum 0.056 . 0.046 * 0.58 0.44
Stage II - 0.00018 *** - 0.48
Stage III - 2.3e-11 *** - 0.32
Stage IV - 2.4e-14 *** - 0.32
log10_ctaf - - 1.9e-19 *** 4.8e-17 ***
AIC 545.9 448.7 201.2 204.6
AIC - min(AIC) 344.7 247.5 0 3.4

Supplementary Table 2: pAUC

Partial AUC (validation set)
dataset pauc
Clinical data 0.5
Fragment endpoints 0.56
SNV 0.56
Allelic imbalance 0.59
SCNA 0.6
Fragment lengths 0.62
WG methylation 0.63
SCNA-WBC 0.64
SNV-WBC 0.64
Pan-feature 0.66

Additional details and computations used in manuscript

Score cutoffs

Train/valid score cutoffs for each classifier
classifier_name train valid
Allelic imbalance 0.5662 0.5792
Clinical data 0.7665 0.7395
Fragment endpoints 0.002322 0.002439
Fragment lengths 0.7325 0.7151
Pan-feature NA 0.8074
SCNA 0.722 0.7923
SCNA-WBC 0.6936 0.7058
SNV 0.6651 0.7501
SNV-WBC 0.5448 0.5448
WG methylation 0.7535 0.826

Detection by stage

Detection by stage
classifier_name train_or_valid I II III IV
Allelic imbalance train 6.0% [3.5-9.4]% (17/284) 10.2% [6.6-14.8]% (24/236) 38.5% [30.8-46.6]% (60/156) 69.4% [61.6-76.5]% (109/157)
Allelic imbalance valid 4.4% [1.8-8.8]% (7/160) 14.2% [8.9-21.1]% (20/141) 32.9% [22.1-45.1]% (23/70) 54.8% [44.2-65.2]% (51/93)
Fragment endpoints train 3.2% [1.5-5.9]% (9/284) 9.7% [6.3-14.3]% (23/236) 26.9% [20.1-34.6]% (42/156) 68.2% [60.3-75.4]% (107/157)
Fragment endpoints valid 3.8% [1.4-8.0]% (6/160) 7.1% [3.5-12.7]% (10/141) 24.3% [14.8-36.0]% (17/70) 54.8% [44.2-65.2]% (51/93)
Fragment lengths train 3.2% [1.5-5.9]% (9/284) 16.5% [12.0-21.9]% (39/236) 45.5% [37.5-53.7]% (71/156) 74.5% [67.0-81.1]% (117/157)
Fragment lengths valid 6.2% [3.0-11.2]% (10/160) 17.0% [11.2-24.3]% (24/141) 52.9% [40.6-64.9]% (37/70) 69.9% [59.5-79.0]% (65/93)
Pan-feature valid 8.8% [4.9-14.2]% (14/160) 19.9% [13.6-27.4]% (28/141) 64.3% [51.9-75.4]% (45/70) 83.9% [74.8-90.7]% (78/93)
SCNA train 3.5% [1.7-6.4]% (10/284) 21.2% [16.2-27.0]% (50/236) 55.1% [47.0-63.1]% (86/156) 79.6% [72.5-85.6]% (125/157)
SCNA valid 5.6% [2.6-10.4]% (9/160) 15.6% [10.0-22.7]% (22/141) 50.0% [37.8-62.2]% (35/70) 63.4% [52.8-73.2]% (59/93)
SCNA-WBC train 5.6% [3.3-9.0]% (16/284) 20.3% [15.4-26.0]% (48/236) 55.8% [47.6-63.7]% (87/156) 80.9% [73.9-86.7]% (127/157)
SCNA-WBC valid 4.4% [1.8-8.8]% (7/160) 18.4% [12.4-25.8]% (26/141) 54.3% [41.9-66.3]% (38/70) 73.1% [62.9-81.8]% (68/93)
SNV train 2.1% [0.8-4.5]% (6/284) 8.9% [5.6-13.3]% (21/236) 31.4% [24.2-39.3]% (49/156) 52.9% [44.8-60.9]% (83/157)
SNV valid 1.9% [0.4-5.4]% (3/160) 11.3% [6.6-17.8]% (16/141) 28.6% [18.4-40.6]% (20/70) 38.7% [28.8-49.4]% (36/93)
SNV-WBC train 5.6% [3.3-9.0]% (16/284) 23.3% [18.1-29.2]% (55/236) 61.5% [53.4-69.2]% (96/156) 84.1% [77.4-89.4]% (132/157)
SNV-WBC valid 8.8% [4.9-14.2]% (14/160) 19.1% [13.0-26.6]% (27/141) 57.1% [44.7-68.9]% (40/70) 79.6% [69.9-87.2]% (74/93)
WG methylation train 7.7% [4.9-11.5]% (22/284) 29.2% [23.5-35.5]% (69/236) 65.4% [57.4-72.8]% (102/156) 86.0% [79.6-91.0]% (135/157)
WG methylation valid 8.1% [4.4-13.5]% (13/160) 19.9% [13.6-27.4]% (28/141) 57.1% [44.7-68.9]% (40/70) 82.8% [73.6-89.8]% (77/93)

LOD

LOD by classifier for train/valid
classifier_name train_or_valid lod p_detection lod_lcb lod_ucb specificity n_obs study
Allelic imbalance train 0.007932 0.5 0.005645 0.01115 SnAtSp=0.98 296 CCGA1
Allelic imbalance valid 0.007827 0.5 0.004465 0.01372 SnAtSp=0.98 113 CCGA1
Fragment endpoints train 0.01205 0.5 0.008126 0.01786 SnAtSp=0.98 296 CCGA1
Fragment endpoints valid 0.01937 0.5 0.009789 0.03833 SnAtSp=0.98 113 CCGA1
Fragment lengths train 0.004095 0.5 0.002865 0.005854 SnAtSp=0.98 296 CCGA1
Fragment lengths valid 0.003154 0.5 0.001835 0.005421 SnAtSp=0.98 113 CCGA1
Pan-feature valid 0.000889 0.5 0.000585 0.001352 SnAtSp=0.98 113 CCGA1
SCNA train 0.002665 0.5 0.001814 0.003916 SnAtSp=0.98 296 CCGA1
SCNA valid 0.003947 0.5 0.002019 0.007718 SnAtSp=0.98 113 CCGA1
SCNA-WBC train 0.001618 0.5 0.001144 0.00229 SnAtSp=0.98 296 CCGA1
SCNA-WBC valid 0.0025 0.5 0.001425 0.004384 SnAtSp=0.98 113 CCGA1
SNV train 0.01868 0.5 0.01199 0.02908 SnAtSp=0.98 296 CCGA1
SNV valid 0.01634 0.5 0.008032 0.03325 SnAtSp=0.98 113 CCGA1
SNV-WBC train 0.001257 0.5 0.000971 0.001628 SnAtSp=0.98 296 CCGA1
SNV-WBC valid 0.001184 0.5 0.000746 0.001879 SnAtSp=0.98 113 CCGA1
WG methylation train 0.000853 0.5 0.000652 0.001115 SnAtSp=0.98 296 CCGA1
WG methylation valid 0.001241 0.5 0.000805 0.001913 SnAtSp=0.98 113 CCGA1
Targeted methylation (Second CCGA substudy) valid 0.000131 0.5 0.000103 0.000166 SnAtSp=0.98 559 CCGA2
LOD relative to WG methylation
classifier_name train_or_valid lod lod_meth lod_relative_to_meth
Allelic imbalance valid 0.007827 0.001241 6.307
Fragment endpoints valid 0.01937 0.001241 15.61
Fragment lengths valid 0.003154 0.001241 2.541
Pan-feature valid 0.000889 0.001241 0.7164
SCNA valid 0.003947 0.001241 3.18
SCNA-WBC valid 0.0025 0.001241 2.015
SNV valid 0.01634 0.001241 13.17
SNV-WBC valid 0.001184 0.001241 0.9541
WG methylation valid 0.001241 0.001241 1
Targeted methylation (Second CCGA substudy) valid 0.000131 0.001241 0.1056

Comparison of confusion matrices

Joint confusion matrices comparison
wh_assay n_correct n_total prec_overall
SCNA 52 127 40.94
SNV-WBC 44 127 34.65
WG methylation 95 127 74.8

Overall CSO accuracy relative to Methylation:
WG methylation:SCNA = 182.7% (95/52)
WG methylation:SNV-WBC = 215.9% (95/44)
WG methylation:WG methylation = 100.0% (95/95)

Correct CSO: WG methylation (rows) vs. SCNA (cols), McNemar P-value = 8e-09
no yes
no 27 5
yes 48 47
Correct CSO: WG methylation (rows) vs. SNV-WBC (cols), McNemar P-value = 6.5e-12
no yes
no 31 1
yes 52 43
Correct CSO: SCNA (rows) vs. SNV-WBC (cols), McNemar P-value = 0.35
no yes
no 51 24
yes 32 20

Spearman correlation: cTAF vs. stage

Compute correlation between cTAF and stage for each cancer type.

Spearman rank correlation test (per tissue)
cancer_type estimate n p.value p.value_adjust stars
Lung 0.4147 40 0.007793 0.007793 **
Colon/Rectum 0.7702 48 1.552e-10 2.07e-10 ***
Breast 0.4837 162 7.034e-11 1.407e-10 ***
Remaining 0.5826 159 7.906e-16 3.163e-15 ***

Detection point estimate comparison to WG methylation

McNemar P-values for difference in detection with each classifier and WG Methylation (Validation set only).

Validation set WG methylation (rows) vs. Allelic imbalance (cols), McNemar P-value = 3.8e-12
cancer non_cancer
cancer 97 61
non_cancer 4 302
Validation set WG methylation (rows) vs. SNV (cols), McNemar P-value = 1.5e-18
cancer non_cancer
cancer 73 85
non_cancer 2 304
Validation set WG methylation (rows) vs. SNV-WBC (cols), McNemar P-value = 0.73
cancer non_cancer
cancer 140 18
non_cancer 15 291
Validation set WG methylation (rows) vs. SCNA (cols), McNemar P-value = 7.4e-06
cancer non_cancer
cancer 116 42
non_cancer 9 297
Validation set WG methylation (rows) vs. SCNA-WBC (cols), McNemar P-value = 0.0017
cancer non_cancer
cancer 132 26
non_cancer 7 299
Validation set WG methylation (rows) vs. Fragment endpoints (cols), McNemar P-value = 7.5e-16
cancer non_cancer
cancer 80 78
non_cancer 4 302
Validation set WG methylation (rows) vs. Fragment lengths (cols), McNemar P-value = 0.0012
cancer non_cancer
cancer 126 32
non_cancer 10 296
Validation set WG methylation (rows) vs. Clinical data (cols), McNemar P-value = 1e-30
cancer non_cancer
cancer 7 149
non_cancer 5 296
Validation set WG methylation (rows) vs. Pan-feature (cols), McNemar P-value = 0.096
cancer non_cancer
cancer 155 3
non_cancer 10 296

Output figs

All output figs except UpSet plots, which are written separately.

Reproducibility

R version 4.1.2 (2021-11-01)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=C.UTF-8, LC_NUMERIC=C, LC_TIME=C.UTF-8, LC_COLLATE=C.UTF-8, LC_MONETARY=C.UTF-8, LC_MESSAGES=C.UTF-8, LC_PAPER=C.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=C.UTF-8 and LC_IDENTIFICATION=C

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: ggplot2(v.3.3.6)

loaded via a namespace (and not attached): Rcpp(v.1.0.9), lattice(v.0.20-45), tidyr(v.1.2.0), digest(v.0.6.29), packrat(v.0.6.0), utf8(v.1.2.2), R6(v.2.5.1), plyr(v.1.8.6), backports(v.1.4.1), evaluate(v.0.15), assertr(v.2.8), highr(v.0.9), pillar(v.1.8.0), rlang(v.1.0.4), scam(v.1.2-12), jquerylib(v.0.1.4), hexbin(v.1.28.2), Matrix(v.1.3-4), rmarkdown(v.2.8), textshaping(v.0.3.6), labeling(v.0.4.2), splines(v.4.1.2), readr(v.2.0.1), stringr(v.1.4.0), pander(v.0.6.3), bit(v.4.0.4), munsell(v.0.5.0), broom(v.1.0.0), compiler(v.4.1.2), xfun(v.0.31), systemfonts(v.1.0.4), pkgconfig(v.2.0.3), mgcv(v.1.8-36), htmltools(v.0.5.2), tidyselect(v.1.1.2), tibble(v.3.1.8), gridExtra(v.2.3), fansi(v.1.0.3), crayon(v.1.5.1), dplyr(v.1.0.9), tzdb(v.0.1.2), withr(v.2.5.0), ggpubr(v.0.2.3), grid(v.4.1.2), nlme(v.3.1-152), jsonlite(v.1.8.0), gtable(v.0.3.0), lifecycle(v.1.0.1), magrittr(v.2.0.3), pROC(v.1.16.2), scales(v.1.2.0), cli(v.3.3.0), stringi(v.1.7.8), vroom(v.1.5.4), cachem(v.1.0.6), farver(v.2.1.1), ggsignif(v.0.6.2), bslib(v.0.4.0), ragg(v.1.2.2), ellipsis(v.0.3.2), generics(v.0.1.3), vctrs(v.0.4.1), cowplot(v.1.1.1), tools(v.4.1.2), bit64(v.4.0.5), glue(v.1.6.2), purrr(v.0.3.4), hms(v.1.1.0), parallel(v.4.1.2), fastmap(v.1.1.0), yaml(v.2.3.5), colorspace(v.2.0-3), UpSetR(v.1.3.3), knitr(v.1.39) and sass(v.0.4.2)